Spatial Analysis and Statistical Modeling with R and spmodel - 2
2024 Society for Freshwater Science Conference
Ryan Hill
EPA (USA)
Michael Dumelle
EPA (USA)
2024-06-02
GIS in R
Maintaining all analyses within a single software (R) can greatly simplify your research workflow. In this section, we’ll cover the basics of doing GIS in R.
Goals and Motivation
Understand the main features and types of vector data.
Generate point data from a set of latitudes and longitudes, such as from fields sites.
Read, write, query, and manipulate vector data using the sf package.
r <-rast(ncol=10, nrow =10)r[] <-runif(n=ncell(r))r
class : SpatRaster
dimensions : 10, 10, 1 (nrow, ncol, nlyr)
resolution : 36, 18 (x, y)
extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
source(s) : memory
name : lyr.1
min value : 0.002414842
max value : 0.977768321
Raster data
plot(r)
Basic raster in R.
Raster data
We can access data from specific locations within a raster
# Access data from the ith location in a rasterr[12]
lyr.1
1 0.6787097
# Access data based on row and columnr[2, 2]
lyr.1
1 0.6787097
Raster data
Rasters can be stacked which can make them very efficient to work with
# Create 2 new rasters based on raster rr2 <- r *50r3 <-sqrt(r *5)# Stack rasters and rename to be uniques <-c(r, r2, r3)names(s) <-c('r1', 'r2', 'r3')
Raster data
plot(s)
Working with real data
There are several packages for accessing geospatial data from the web
We will use the FedData package, but numerous other packages exist to access data within and without the U.S.
One useful example is the elevatr package for accessing elevation data around the world
Working with real data
We will walk through an example of extracting information from a raster using a polygon layer. To do this we will:
Select just Corvallis among oregon_cities
Add a 10,000m buffer
Download National Elevation Data
Transform the projection system of the elevation data to match Corvallis
Calculate the average elevation within 10km of Corvallis
Working with real data
Buffer Corvallis
library(FedData)# Select just Corvallis and calculate a 10,000-m buffercorvallis <- oregon_cities %>%filter(cities =='Corvallis') %>%st_buffer(10000)
Working with real data
Download NED based on Corvallis buffer
# Download national elevation data (ned)ned <- FedData::get_ned(template = corvallis,label ="corvallis")
Transform the CRS
ned <- terra::project(ned, 'epsg:5070',method ='bilinear')
Working with real data
Mean elevation within polygon
# zonal function in terra to calculate zonal statisticsterra::zonal(ned, # Need to convert corvallis `sf` object to terra vector terra::vect(corvallis), # Metric to be calculated mean, na.rm = T)
Layer_1
1 119.6678
Your Turn
Read in U.S. cities with data('us.cities') from the maps library
Select the city of your choice and buffer it by 10Km. (We suggest converting to an equal area projection first)
Read in National Elevation Data for your city with the FedData package
Transform CRS of elevation data to match city
Calculate the mean elevation within 10km of your city
Characterizing watersheds is fundamental to much of our work in freshwater science
Although it is more than we can cover in today’s workshop, we want you to be aware that there are several options for delineating watersheds in R
We’ll provide two examples of how to delineate watersheds within the conterminous U.S. using two online services
USGS StreamStats
The USGS’s StreamStats is an online service and map interface that allows users to navigate to a desired location and delineate a watershed boundary with the click of a mouse:
nhdplusTools is an R package that can access the Network Linked Data Index (NLDI) service, which provides navigation and extraction of NHDPlus data: https://doi-usgs.github.io/nhdplusTools/
nhdplusTools includes network navigation options as well as watershed delineation
The delineation method differs from StreamStats (based on National Hydrography IDs)
nhdplusTools
library(nhdplusTools)# Simple feature option to generate point without any other attributescal_pt <-st_sfc(st_point(c(longitude, latitude)), crs =4269)# Identify the network location (NHDPlus common ID or COMID)start_comid <- nhdplusTools::discover_nhdplus_id(cal_pt)# Combine info into list (required by NLDI basin function)ws_source <-list(featureSource ="comid", featureID = start_comid)cal_ws2 <- nhdplusTools::get_nldi_basin(nldi_feature = ws_source)
nhdplusTools
nhdplusTools
nhdplusTools works by walking a network of pre-staged sub-catchments with unique IDs (COMIDs)
Other watershed delination methods
Outside of the U.S., these tools are not available
It is still possible to delineate a custom watershed from raw DEM data with help from the whitebox R package
# Logan River Watershedlatitude <-41.707longitude <--111.855# Define the lat/lonstart_point <-st_sfc(st_point(c(longitude, latitude)), crs =4269)# Find COMID of this pointstart_comid <- nhdplusTools::discover_nhdplus_id(start_point)# Create a list object that defines the feature source and starting COMIDws_source <-list(featureSource ="comid", featureID = start_comid)# Delineate basinlogan_ws <- nhdplusTools::get_nldi_basin(nldi_feature = ws_source)
Solution
Map the Logan River watershed
mapview::mapview(logan_ws)
StreamCatTools
Why StreamCat/LakeCat?
Examples covered are for single watersheds - but we often need data for hundreds or thousands
Overlaying watersheds onto numerous landscape rasters/layers is complicated/long process
StreamCat/LakeCat solves this issue by providing summarized data for all streams and lakes in the NHDPlus (medium resolution)
Why StreamCat/LakeCat?
Let’s revisit the Calapooia watershed and calculate percent row crop for 2019 with FedData
StreamCat and the StreamCatTools package provide access to the same information with much less code because StreamCat pre-stages watershed metrics for each NHDPlus catchment
StreamCatTools
StreamCat also provides metrics at several scales
Figure 5: Geospatial framework of the StreamCat Dataset
Histogram of crop as a % of watersheds within Iowa in 2019.
StreamCatTools
We can provide multiple metrics to the request by separating them with a comma.
Note that the request is provided as a single string, 'PctCrop2001, PctCrop2019', rather than a vector of metrics: c('PctCrop2001', 'PctCrop2019').
The request itself is agnostic to formatting of the text. For example, these requests will also work: 'pctcrop2001, pctcrop2019' or 'PCTCROP2001,PCTCROP2019'.
StreamCatTools
StreamCat contains hundreds of metrics and we recommend consulting the metric list to identify those of interest for your study.
Accessing LakeCat
The R function to access LakeCat data was designed to parallel StreamCat functions
Let’s walk through an example together that:
Define a sf object of a sample point at Pelican Lake, WI
Obtain the lake waterbody polygon
Extract the COMID (unique ID) to query LakeCat
Pull data on mean watershed elevation, calcium oxide content of the geology, % sand and organic matter content of soils, and % of the watershed composed of deciduous forest
This analysis is based on a recent paper by Michael Dumelle and others published in the journal Spatial Statistics. The GitHub repository for this paper is also available
In addition to these static LakeCat data, we would also like to pull in data from specific years of NLCD to match sample years for % of watershed composed of crop area
To do this we will need to:
Grab LakeCat NLCD % crop data for years 2006, 2011, 2016
Clean and pivot the columns and add a year column
Add 1 to each year since available NLCD are 1 year behind field samples
Data Prep: LakeCat (Independent) Data
crop <-# Grab LakeCat crop datalc_get_data(comid = comids,aoi ='watershed',metric ='pctcrop2006, pctcrop2011, pctcrop2016') %>%# Remove watershed area from dataselect(-WSAREASQKM) %>%# Pivot table to long to create "year" columnpivot_longer(!COMID, names_to ='tmpcol', values_to ='PCTCROPWS') %>%# Remove PCTCROP and WS to make "year" columnmutate(year =as.integer(str_replace_all(tmpcol, 'PCTCROP|WS', ''))) %>%# Add 1 to each year to match NLA yearsmutate(year =factor(year +1)) %>%# Remove the tmp columnselect(-tmpcol)
In this example, we will model the occurrence of the Argia damselfly in several northeastern states
Image from: inaturalist.org
finsyncR
To start this exercise, we’ll introduce a new R package called finsyncR. This package was developed by US EPA scientists and academic collaborators.
finsyncR
finsyncR provides us with access to thousands of biological samples across the U.S. from both EPA and USGS sources (e.g., Rumschlag et al. 2023). We’ll be using EPA data today
finsyncR is running: Gathering, joining, and cleaning EPA raw data
finsyncR is running: Rarefying EPA data
finsyncR is running: Applying taxonomic fixes to EPA data
finsyncR is running: Finalizing data for output
finsyncR data synchronization complete
The prism package requires that we set a temporary folder in our work space. Here, we set it to “prism_data” inside of our “data” folder. It will create this folder if it does not already exist.
library(prism)# Get these years of PRISMyears <-c(2013, 2014, 2018, 2019)# Set the PRISM directory (creates directory in not present)prism_set_dl_dir("./data/prism_data", create =TRUE)# Download monthly PRISM rasters (tmean)get_prism_monthlys('tmean', years = years, mon =7:8, keepZip =FALSE)
# Create stack of downloaded PRISM rasterstmn <-pd_stack((prism_archive_subset("tmean","monthly", years = years, mon =7:8)))# Extract tmean at sample points and massage datatmn <- terra::extract(tmn, # Transform taxon_rg to CRS of PRISM on the fly taxon_rg %>%st_transform(crs =st_crs(tmn))) %>%# Add COMIDs to extracted valuesdata.frame(COMID = comid_vect, .) %>%# Remove front and back text from PRISM year/month in namesrename_with( ~ stringr::str_replace_all(., 'PRISM_tmean_stable_4kmM3_|_bil', '')) %>%# Pivot to long table and calle column TMEANPRISMXXXXPT, XXXX indicates yearpivot_longer(!COMID, names_to ='year_month', values_to ='TMEANPRISMXXXXPT') %>%# Create new column of yearmutate(year =year(ym(year_month))) %>%# Average July and August temperatures summarise(TMEANPRISMXXXXPT =mean(TMEANPRISMXXXXPT, na.rm =TRUE), .by =c(COMID, year))